13 research outputs found
Francesco Robortello, «In librum Aristotelis De arte poetica explicationes»
In the context of the census of sixteenth century Aristotele’s Poetica editiions and commentaries, we publish the record of Francesco Robortello’s Explicationes.
Â
-------
Â
Nel contesto del censimento delle edizioni e commenti cinquecenteschi della Poetica di Aristotele, si presenta la scheda relativa alle Explicationes di Francesco Robortello
Video Motion: Finding Complete Motion Paths for Every Visible Point
<p>The problem of understanding motion in video has been an area of intense research in computer vision for decades. The traditional approach is to represent motion using optical flow fields, which describe the two-dimensional instantaneous velocity at every pixel in every frame. We present a new approach to describing motion in video in which each visible world point is associated with a sequence-length video motion path. A video motion path lists the location where a world point would appear if it were visible in every frame of the sequence. Each motion path is coupled with a vector of binary visibility flags for the associated point that identify the frames in which the tracked point is unoccluded.</p><p>We represent paths for all visible points in a particular sequence using a single linear subspace. The key insight we exploit is that, for many sequences, this subspace is low-dimensional, scaling with the complexity of the deformations and the number of independent objects in the scene, rather than the number of frames in the sequence. Restricting all paths to lie within a single motion subspace provides strong regularization that allows us to extend paths through brief occlusions, relying on evidence from the visible frames to hallucinate the unseen locations.</p><p>This thesis presents our mathematical model of video motion. We define a path objective function that optimizes a set of paths given estimates of visible intervals, under the assumption that motion is generally spatially smooth and that the appearance of a tracked point remains constant over time. We estimate visibility based on global properties of all paths, enforcing the physical requirement that at least one tracked point must be visible at every pixel in the video. The model assumes the existence of an appropriate path motion basis; we find a sequence-specific basis through analysis of point tracks from a frame-to-frame tracker. Tracking failures caused by image noise, non-rigid deformations, or occlusions complicate the problem by introducing missing data. We update standard trackers to aggressively reinitialize points lost in earlier frames. Finally, we improve on standard Principal Component Analysis with missing data by introducing a novel compaction step that associates these relocalized points, reducing the amount of missing data that must be overcome. The full system achieves state-of-the-art results, recovering dense, accurate, long-range point correspondences in the face of significant occlusions.</p>Dissertatio
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
This paper introduces a video dataset of spatio-temporally localized Atomic
Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual
actions in 430 15-minute video clips, where actions are localized in space and
time, resulting in 1.58M action labels with multiple labels per person
occurring frequently. The key characteristics of our dataset are: (1) the
definition of atomic visual actions, rather than composite actions; (2) precise
spatio-temporal annotations with possibly multiple annotations for each person;
(3) exhaustive annotation of these atomic actions over 15-minute video clips;
(4) people temporally linked across consecutive segments; and (5) using movies
to gather a varied set of action representations. This departs from existing
datasets for spatio-temporal action recognition, which typically provide sparse
annotations for composite actions in short video clips. We will release the
dataset publicly.
AVA, with its realistic scene and action complexity, exposes the intrinsic
difficulty of action recognition. To benchmark this, we present a novel
approach for action localization that builds upon the current state-of-the-art
methods, and demonstrates better performance on JHMDB and UCF101-24 categories.
While setting a new state of the art on existing datasets, the overall results
on AVA are low at 15.6% mAP, underscoring the need for developing new
approaches for video understanding.Comment: To appear in CVPR 2018. Check dataset page
https://research.google.com/ava/ for detail
Articulated motion discovery using pairs of trajectories
We propose an unsupervised approach for discovering characteristic motion patterns in videos of highly artic-ulated objects performing natural, unscripted behaviors, such as tigers in the wild. We discover consistent patterns in a bottom-up manner by analyzing the relative displace-ments of large numbers of ordered trajectory pairs through time, such that each trajectory is attached to a different moving part on the object. The pairs of trajectories de-scriptor relies entirely on motion and is more discriminative than state-of-the-art features that employ single trajecto-ries. Our method generates temporal video intervals, each automatically trimmed to one instance of the discovered behavior, and clusters them by type (e.g., running, turn-ing head, drinking water). We present experiments on two datasets: dogs from YouTube-Objects and a new dataset of National Geographic tiger videos. Results confirm that our proposed descriptor outperforms existing appearance- and trajectory-based descriptors (e.g., HOG and DTFs) on both datasets and enables us to segment unconstrained animal video into intervals containing single behaviors. 1
Behavior Discovery and Alignment of Articulated Object Classes from Unstructured Video
We propose an automatic system for organizing the content of a collection of
unstructured videos of an articulated object class (e.g. tiger, horse). By
exploiting the recurring motion patterns of the class across videos, our
system: 1) identifies its characteristic behaviors; and 2) recovers
pixel-to-pixel alignments across different instances. Our system can be useful
for organizing video collections for indexing and retrieval. Moreover, it can
be a platform for learning the appearance or behaviors of object classes from
Internet video. Traditional supervised techniques cannot exploit this wealth of
data directly, as they require a large amount of time-consuming manual
annotations.
The behavior discovery stage generates temporal video intervals, each
automatically trimmed to one instance of the discovered behavior, clustered by
type. It relies on our novel motion representation for articulated motion based
on the displacement of ordered pairs of trajectories (PoTs). The alignment
stage aligns hundreds of instances of the class to a great accuracy despite
considerable appearance variations (e.g. an adult tiger and a cub). It uses a
flexible Thin Plate Spline deformation model that can vary through time. We
carefully evaluate each step of our system on a new, fully annotated dataset.
On behavior discovery, we outperform the state-of-the-art Improved DTF
descriptor. On spatial alignment, we outperform the popular SIFT Flow
algorithm.Comment: 19 pages, 19 figure, 3 tables. arXiv admin note: substantial text
overlap with arXiv:1411.788